| נושא המחקר | פירוט הנושא |
|---|---|
| האם ניתן לחזות מחיר של משחק בעוד x זמן? | מתי הכי ישתלם לקנות את המשחק בעתיד |
| האם ניתן לחזור מתי למוכר הכי משתלם לעשות מבצע על המשחק? | מתי הכי כדאי למוכר לעשות מבצע כדי להביא עוד שחקנים ולהמשיך למכור עם הרווח הגדול ביותר |
| נתוני המחקר & דרכי ניתוח | ואיך ננתח אותם |
|---|---|
| פירוט הנתונים: פרטים פיננסיים הוא נמכר בזמן X ואז לחזות בכמה הוא ימכר בזמן Y ובנוסף פרטים על המשחק - כמו שם, ז'אנר פופולריות וכו'. |
נשתמש ב-Crawling על אתר isThereAnyDeal(Fig.3) |
| דרכי ניתוח: נשתמש בכלים שלמדנו במהלך הקורס לעבד\ללמוד מהנתונים כמה מידע שאנו צריכים למטרה זו |
ננתח את הDataFrame, בעזרת טבלאות יחסי משתנים, סטטיסטיקות, ולבסוף ננסה ללמד מכונה שתחזה את התאריך של המחיר הזול ביותר בשנה מסוימת, ואת המחיר אשר יביא את כמות המכירות הגדול ביותר |
|
|
|
| steamId | title | history_link | type | name | steam_appid | required_age | is_free | controller_support | detailed_description | ... | supported_language.Italian | category.Against players (general screen) | category.General screen | genre.Casual games | genre.Race | genre.Animation & Modeling | genre.Education | genre.Software Training | genre.Utilities | genre.Game Development | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | game | Shift Happens | 359840 | 0 | False | full | <img src="https://cdn.cloudflare.steamstatic.c... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 1174180 | Red Dead Redemption 2 | https://isthereanydeal.com/game/reddeadredempt... | game | Red Dead Redemption 2 | 1174180 | 0 | False | NaN | <h1>Ultimate Edition</h1><p><img src="https://... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 1091500 | Cyberpunk 2077 | https://isthereanydeal.com/game/cyberpunkii0vi... | game | Cyberpunk 2077 | 1091500 | 18 | False | NaN | <h1>Check out other games from CD PROJEKT RED<... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 442070 | Drawful 2 | https://isthereanydeal.com/game/drawfulii/hist... | game | Drawful 2 | 442070 | 0 | False | full | Updated with awesome new features:<br />\r\nNo... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 21660 | Street Fighter IV | https://isthereanydeal.com/game/streetfighteri... | game | Street Fighter® IV | 21660 | 0 | False | NaN | Street Fighter® IV brings the legendary fighti... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 718 | 1004490 | Tools Up! | https://isthereanydeal.com/game/toolsup/history/ | game | Tools Up! | 1004490 | 0 | False | full | <h1>Chat with us on Discord</h1><p><a href="ht... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 719 | 305380 | Blue Estate The Game | https://isthereanydeal.com/game/blueestategame... | game | Blue Estate The Game | 305380 | 0 | False | NaN | <strong>Blue Estate</strong> provides previous... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 720 | 696530 | Lake Ridden | https://isthereanydeal.com/game/lakeridden/his... | game | Lake Ridden | 696530 | 0 | False | full | <h1>Lake Ridden Is LIVE!</h1><p>Hi all! Lake R... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 721 | 1421760 | What Comes After | https://isthereanydeal.com/game/whatcomesafter... | game | What Comes After | 1421760 | 0 | False | NaN | <h1>More from Rolling Glory Jam</h1><p><a href... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 722 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | game | Age of Barbarian Extended Cut | 402880 | 18 | False | full | <strong>INTRODUCTION</strong><br><br><i>In a w... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
723 rows × 147 columns
save_df(games_dataframe, "raw_games_data_with_steam_details.csv")
games_dataframe = load_df("raw_games_data_with_steam_details.csv")
games_dataframe
| steamId | title | history_link | type | name | steam_appid | required_age | is_free | controller_support | detailed_description | ... | supported_language.Italian | category.Against players (general screen) | category.General screen | genre.Casual games | genre.Race | genre.Animation & Modeling | genre.Education | genre.Software Training | genre.Utilities | genre.Game Development | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | game | Shift Happens | 359840 | 0 | False | full | <img src="https://cdn.cloudflare.steamstatic.c... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 1174180 | Red Dead Redemption 2 | https://isthereanydeal.com/game/reddeadredempt... | game | Red Dead Redemption 2 | 1174180 | 0 | False | NaN | <h1>Ultimate Edition</h1><p><img src="https://... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 1091500 | Cyberpunk 2077 | https://isthereanydeal.com/game/cyberpunkii0vi... | game | Cyberpunk 2077 | 1091500 | 18 | False | NaN | <h1>Check out other games from CD PROJEKT RED<... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 442070 | Drawful 2 | https://isthereanydeal.com/game/drawfulii/hist... | game | Drawful 2 | 442070 | 0 | False | full | Updated with awesome new features:<br />\r\nNo... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 21660 | Street Fighter IV | https://isthereanydeal.com/game/streetfighteri... | game | Street Fighter® IV | 21660 | 0 | False | NaN | Street Fighter® IV brings the legendary fighti... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 718 | 1004490 | Tools Up! | https://isthereanydeal.com/game/toolsup/history/ | game | Tools Up! | 1004490 | 0 | False | full | <h1>Chat with us on Discord</h1><p><a href="ht... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 719 | 305380 | Blue Estate The Game | https://isthereanydeal.com/game/blueestategame... | game | Blue Estate The Game | 305380 | 0 | False | NaN | <strong>Blue Estate</strong> provides previous... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 720 | 696530 | Lake Ridden | https://isthereanydeal.com/game/lakeridden/his... | game | Lake Ridden | 696530 | 0 | False | full | <h1>Lake Ridden Is LIVE!</h1><p>Hi all! Lake R... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 721 | 1421760 | What Comes After | https://isthereanydeal.com/game/whatcomesafter... | game | What Comes After | 1421760 | 0 | False | NaN | <h1>More from Rolling Glory Jam</h1><p><a href... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 722 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | game | Age of Barbarian Extended Cut | 402880 | 18 | False | full | <strong>INTRODUCTION</strong><br><br><i>In a w... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
723 rows × 147 columns
def drop_columns(df: DataFrame, columns) -> DataFrame:
return df.drop(columns, axis=1)
def use_val(fun):
return lambda col: (fun(ast.literal_eval(col)) if type(col) is str else fun(col))
def map_steam_api_game_data(ndf: DataFrame) -> DataFrame:
df = ndf.copy()
ignored_columns = [
'type',
'name',
'steam_appid',
'detailed_description',
'about_the_game',
'short_description',
# 'fullgame',
'header_image',
'website',
'pc_requirements',
'mac_requirements',
'linux_requirements',
'legal_notice',
'price_overview',
'package_groups',
'screenshots',
'achievements',
'background',
'content_descriptors',
'support_info',
'ext_user_account_notice',
'reviews',
'drm_notice'
]
df = drop_columns(df, ignored_columns)
for (new_col, col) in (('number_of_demos', 'demos'), ('num_of_game_videos', 'movies'), ('num_of_dlc', 'dlc'), ('num_of_packages_game_is_in', 'packages')):
df[new_col] = df[col].apply(use_val(get_len))
df = drop_columns(df, col)
df['metacritic_score'] = df['metacritic'].apply(use_val(pluckBy('score', lambda score: score / 100)))
df = drop_columns(df, 'metacritic')
for (new_col, col, key) in (('windows_supported', 'platforms', 'windows'), ('mac_supported', 'platforms', 'mac'), ('linux_supported', 'platforms', 'linux')):
df[new_col] = df[col].apply(use_val(pluckBy(key, lambda val: 1 if val == True else 0, lambda _: 0)))
df = drop_columns(df, 'platforms')
df['total_steam_recommendations'] = df['recommendations'].apply(use_val(pluck('total')))
df = drop_columns(df, 'recommendations')
df['release_date'] = df['release_date'].apply(use_val(pluckBy('date', parse_datetime)))
for col in filter(lambda c: len(c.split('.')) > 1,df.columns):
df[col].fillna(False, inplace=True)
return df
games_joined_with_steam_details = map_steam_api_game_data(games_dataframe)
save_df(games_joined_with_steam_details, 'games_details_data.csv')
Unknown string format: 16 of Dec. from 2014
games_joined_with_steam_details = load_df('games_details_data.csv')
games_joined_with_steam_details
| steamId | title | history_link | required_age | is_free | controller_support | release_date | category.Single-player | category.Multi-player | category.PvP | ... | genre.Game Development | number_of_demos | num_of_game_videos | num_of_dlc | num_of_packages_game_is_in | metacritic_score | windows_supported | mac_supported | linux_supported | total_steam_recommendations | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | False | 1 | 2 | 0 | 1 | NaN | 1 | 0 | 0 | 1037.0 |
| 1 | 1174180 | Red Dead Redemption 2 | https://isthereanydeal.com/game/reddeadredempt... | 0 | False | NaN | 2019-12-05 | True | True | True | ... | False | 0 | 2 | 0 | 2 | 0.93 | 1 | 0 | 0 | 210522.0 |
| 2 | 1091500 | Cyberpunk 2077 | https://isthereanydeal.com/game/cyberpunkii0vi... | 18 | False | NaN | 2020-12-09 | True | False | False | ... | False | 0 | 12 | 0 | 1 | 0.86 | 1 | 0 | 0 | 423847.0 |
| 3 | 442070 | Drawful 2 | https://isthereanydeal.com/game/drawfulii/hist... | 0 | False | full | 2016-06-20 | False | True | True | ... | False | 0 | 2 | 0 | 1 | NaN | 1 | 1 | 1 | 650.0 |
| 4 | 21660 | Street Fighter IV | https://isthereanydeal.com/game/streetfighteri... | 0 | False | NaN | 2009-07-07 | True | True | True | ... | False | 0 | 0 | 0 | 0 | 0.91 | 1 | 0 | 0 | 438.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 718 | 1004490 | Tools Up! | https://isthereanydeal.com/game/toolsup/history/ | 0 | False | full | 2019-12-03 | True | True | False | ... | False | 1 | 3 | 4 | 1 | NaN | 1 | 0 | 0 | 590.0 |
| 719 | 305380 | Blue Estate The Game | https://isthereanydeal.com/game/blueestategame... | 0 | False | NaN | 2015-04-08 | True | True | False | ... | False | 0 | 1 | 0 | 1 | 0.59 | 1 | 0 | 0 | 1026.0 |
| 720 | 696530 | Lake Ridden | https://isthereanydeal.com/game/lakeridden/his... | 0 | False | full | 2018-05-10 | True | False | False | ... | False | 0 | 2 | 1 | 1 | 0.68 | 1 | 0 | 0 | 185.0 |
| 721 | 1421760 | What Comes After | https://isthereanydeal.com/game/whatcomesafter... | 0 | False | NaN | 2020-11-05 | True | False | False | ... | False | 0 | 1 | 0 | 1 | NaN | 1 | 1 | 0 | 439.0 |
| 722 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | False | 0 | 2 | 2 | 1 | NaN | 1 | 0 | 0 | 656.0 |
723 rows × 127 columns
pounds_to_usd_conversion_rate = 1.34
def convert_to_usd(price: str) -> float:
parsed_price = float(re.sub('\$|\£|\s','',price).strip().lstrip())
if ('£' in price):
parsed_price = parsed_price * pounds_to_usd_conversion_rate
return parsed_price
def get_game_history_details(steamId: str, history_link: str) -> DataFrame:
soup = cache_soup('games', steamId, lambda x: get_server_response(history_link))
player_count = cache_json('steam_charts_player_count', steamId, get_steam_charts_player_count)
df = DataFrame()
vals = soup.select("div.lg2.game")
print(f'found {len(vals)} history values for game {steamId}')
for val in vals:
dateText = val.select_one("span.lg2__time-rel").attrs['title']
date = datetime.datetime.strptime(dateText, '%a, %d %b %Y %H:%M:%S +0000')
if len(player_count) > 0:
current_player_count = min(player_count, key=lambda x: abs(datetime.datetime.fromtimestamp(x[0] / 1000)-date))[1]
else:
print(f'Problem at steamId: {steamId}')
current_player_count = None
shop_title = val.select_one('.shopTitle').text.strip()
regular_price_text = val.select_one('div:nth-child(2) > span.lg2__price').text
price_now_text = val.select_one('div:nth-child(3) > span.lg2__price').text
regular_price = convert_to_usd(regular_price_text)
price_now = convert_to_usd(price_now_text)
if (regular_price != 0):
price_change_percentage = price_now / regular_price
else:
price_change_percentage = 0
df = df.append(DataFrame([{'record_date': date, 'player_count': current_player_count, 'shop': shop_title, 'regular_price': regular_price, 'price_now': price_now, 'price_change': price_change_percentage}]))
return df
def get_games_with_history():
games_with_history = DataFrame()
l = list(games_joined_with_steam_details.iterrows())
for index, row in tqdm(l, total=len(games_joined_with_steam_details.index)):
try:
row_df = DataFrame([row])
game_history_details = get_game_history_details(row['steamId'], row['history_link'])
game_history_details['key'] = 1
row_df['key'] = 1
cross = row_df.merge(game_history_details, how = 'outer')
games_with_history = games_with_history.append(cross)
except Exception as e:
print(f'Exception at index: {index}', e)
games_with_history.dropna(subset=['shop', 'regular_price'], inplace=True)
return games_with_history
games_with_history = get_games_with_history()
games_with_history
save_df(games_with_history, "games_with_history.csv")
games_with_history = load_df("games_with_history.csv")
games_with_history
| steamId | title | history_link | required_age | is_free | controller_support | release_date | category.Single-player | category.Multi-player | category.PvP | ... | mac_supported | linux_supported | total_steam_recommendations | key | record_date | player_count | shop | regular_price | price_now | price_change | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 0 | 0 | 1037.0 | 1 | 2022-01-14 18:01:14 | 170.0 | Humble Store | 14.99 | 14.99 | 1.000000 |
| 1 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 0 | 0 | 1037.0 | 1 | 2022-01-13 08:10:28 | 96.0 | Fanatical | 14.99 | 11.01 | 0.734490 |
| 2 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 0 | 0 | 1037.0 | 1 | 2022-01-12 18:36:23 | 56.0 | Steam | 14.99 | 1.49 | 0.099400 |
| 3 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 0 | 0 | 1037.0 | 1 | 2022-01-08 03:25:58 | 36.0 | Nuuvem | 14.49 | 14.49 | 1.000000 |
| 4 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 0 | 0 | 1037.0 | 1 | 2022-01-07 18:11:54 | 81.0 | Humble Store | 14.99 | 1.49 | 0.099400 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 401715 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 0 | 0 | 656.0 | 1 | 2016-10-31 17:16:26 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 401716 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 0 | 0 | 656.0 | 1 | 2016-10-24 17:15:38 | NaN | Steam | 12.99 | 9.87 | 0.759815 |
| 401717 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 0 | 0 | 656.0 | 1 | 2016-08-29 17:45:38 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 401718 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 0 | 0 | 656.0 | 1 | 2016-08-22 17:45:25 | NaN | Steam | 12.99 | 11.69 | 0.899923 |
| 401719 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 0 | 0 | 656.0 | 1 | 2016-06-03 19:52:31 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
401720 rows × 134 columns
def clean_game_history_details(ndf: DataFrame) -> DataFrame:
df = ndf.copy()
if df.get('shop') is not None:
df.drop(df[df['shop']=='voidu'].index, inplace=True)
df.drop(df[df['shop']=='Voidu'].index, inplace=True)
if df.get('regular_price') is not None:
df.drop(df[df['regular_price'] == 0].index, inplace=True)
df['controller_support'].fillna('no', inplace=True)
df.drop(['key'], axis=1, inplace=True)
return df
cleaned_games_with_history = clean_game_history_details(games_with_history)
save_df(cleaned_games_with_history, 'cleaned_games_with_history.csv')
cleaned_games_with_history = load_df('cleaned_games_with_history.csv')
cleaned_games_with_history
| steamId | title | history_link | required_age | is_free | controller_support | release_date | category.Single-player | category.Multi-player | category.PvP | ... | windows_supported | mac_supported | linux_supported | total_steam_recommendations | record_date | player_count | shop | regular_price | price_now | price_change | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-14 18:01:14 | 170.0 | Humble Store | 14.99 | 14.99 | 1.000000 |
| 1 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-13 08:10:28 | 96.0 | Fanatical | 14.99 | 11.01 | 0.734490 |
| 2 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-12 18:36:23 | 56.0 | Steam | 14.99 | 1.49 | 0.099400 |
| 3 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-08 03:25:58 | 36.0 | Nuuvem | 14.49 | 14.49 | 1.000000 |
| 4 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-07 18:11:54 | 81.0 | Humble Store | 14.99 | 1.49 | 0.099400 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 393450 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-31 17:16:26 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 393451 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-24 17:15:38 | NaN | Steam | 12.99 | 9.87 | 0.759815 |
| 393452 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-29 17:45:38 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 393453 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-22 17:45:25 | NaN | Steam | 12.99 | 11.69 | 0.899923 |
| 393454 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-06-03 19:52:31 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
393455 rows × 133 columns
cleaned_games_with_history = cleaned_games_with_history.loc[:, ~cleaned_games_with_history.columns.str.contains('^Unnamed')]
for column in cleaned_games_with_history.columns:
if 'supported' not in column and cleaned_games_with_history[column].nunique() == 1:
cleaned_games_with_history.drop([column], axis=1, inplace=True)
cleaned_games_with_history = cleaned_games_with_history[cleaned_games_with_history["shop"] != "Dreamgame"] # Remove shop with weird values!
cleaned_games_with_history = cleaned_games_with_history[cleaned_games_with_history["regular_price"] < 80] # Remove outliers (games costing more than 80$ without any dlc)
cleaned_games_with_history = cleaned_games_with_history[cleaned_games_with_history["num_of_dlc"] < 50] # Remove games with amount of dlc that doesn't make sense
save_df(cleaned_games_with_history, 'games_after_cleaning_columns_with_only_one_unique_value.csv')
pie_chart_size = 10
default_plot_title_size = 24
default_pie_label_size = 15
default_2d_label_size = 20
def set_default_pie_options(settings):
settings['label_size'] = settings['label_size'] if settings.get('label_size') else default_pie_label_size
settings['is_percentage'] = settings['is_percentage'] if settings.get('is_percentage') else True
def set_default_one_d_options(settings):
if not settings:
return
settings['title_size'] = settings['title_size'] if settings.get('title_size') else default_plot_title_size
settings['title'] = settings['title'] if settings.get('title') else ''
settings['pctdistance'] = settings['pctdistance'] if settings.get('pctdistance') else 1.2
settings['labeldistance'] = settings['labeldistance'] if settings.get('labeldistance') else 1.4
def get_frequent_elements(df: DataFrame, col_name: str, num_top_elements: int) -> pd.Series:
return pd.Series(df[col_name].value_counts()[:num_top_elements].sort_index())
def one_dim_plot(sr: pd.Series, plot_type: str, axis: int = None, plot_style: dict = None):
set_default_one_d_options(plot_style)
options = dict()
options['kind'] = plot_type
if axis:
options['ax'] = axis
if plot_style.get('is_percentage'):
options['autopct']='%1.0f%%'
if plot_style.get('title'):
options['title']=plot_style.get('title')
if plot_style.get('label_size'):
options['fontsize']=plot_style.get('label_size')
return sr.plot(**options)
def plot_scatter(df: DataFrame, x: str, y: str):
sns.lmplot(x=x,y=y,data=df,fit_reg=True)
def set_pie_plot_styling(plot=None, settings: dict = None):
if settings is None: return
if settings.get('title_size'):
plt.rcParams['figure.titlesize'] = settings.get('title_size')
if plot:
if settings.get('title'):
plt.title(settings.get('title'), pad=settings.get('title_size'), fontsize=settings.get('title_size'))
if settings.get('sq_size') and settings.get('num_of_charts'):
plt.rcParams['figure.figsize'] = (settings['sq_size']*settings['num_of_charts'], settings['sq_size'])
elif settings.get('sq_size'):
plt.rcParams['figure.figsize'] = (settings['sq_size'], settings['sq_size'])
if settings.get('y_label'):
plot.set_ylabel(settings.get('y_label'), labelpad=settings.get('label_size'), fontsize=settings.get('label_size'))
def set_plot_styling(plot, settings: dict = None):
set_default_one_d_options(settings)
if settings is None: return
if settings.get('sq_size'):
plt.rcParams['figure.figsize'] = (settings['sq_size'], settings['sq_size'])
if settings.get('remove_x_labels'):
plot.set(xticklabels=[])
if settings.get('remove_y_labels'):
plot.set(yticklabels=[])
if settings.get('x_size') and plt.rcParams['figure.fixsize']:
plt.rcParams['figure.fixsize'][0] = settings['x_size']
if settings.get('y_size') and plt.rcParams['figure.fixsize']:
plt.rcParams['figure.fixsize'][1] = settings['y_size']
if settings.get('title'):
plot.set_title(settings.get('title'), pad=settings.get('title_size'), fontsize=settings.get('title_size'))
if settings.get('x_label_rotation'):
plot.set_xticklabels(plot.get_xticklabels(), rotation=settings.get('x_label_rotation'))
if settings.get('y_label_rotation'):
plot.set_xticklabels(plot.get_xticklabels(), rotation=settings.get('y_label_rotation'))
if settings.get('x_label'):
plot.set_xlabel(settings.get('x_label'), labelpad=settings.get('label_size'), fontsize=settings.get('label_size'))
if settings.get('y_label'):
plot.set_ylabel(settings.get('y_label'), labelpad=settings.get('label_size'), fontsize=settings.get('label_size'))
def plot_violin(df: DataFrame, options: dict, plot_style: dict):
plot = sns.violinplot(data=df, cut=0, scale='width', **options)
set_plot_styling(plot, plot_style)
def plot_line(df: DataFrame, x: str, y: str, plot_style: dict = None):
plot = sns.lineplot(x=x, y=y, data=df)
if plot_style:
set_plot_styling(plot=plot, settings=plot_style)
def plot_frequency(df: DataFrame, col: str, axis = None, plot_style: dict = None):
elm = get_frequent_elements(df, col, 5)
if plot_style.get('num_of_charts'):
plt.rcParams['figure.figsize'] = (pie_chart_size*plot_style['num_of_charts'], pie_chart_size)
if axis:
plot = one_dim_plot(elm, 'pie', axis, plot_style=plot_style)
set_pie_plot_styling(axis, plot_style)
else:
plot = one_dim_plot(elm, 'pie', plot_style=plot_style)
set_pie_plot_styling(plot, plot_style)
return plot
def plot_bar_chart(df: DataFrame, sizeX: int = None, sizeY: int = None):
if sizeX and sizeY:
plt.rcParams['figure.figsize'] = (sizeX, sizeY)
df.plot.bar(rot=0)
def plot_pie_remove_duplicates(df: DataFrame, key: str, axis=None, plot_style: dict = None):
plot_df = df.copy()
# we need to drop the repeating games (because we want the real average)
plot_df.drop_duplicates(subset=['steamId', 'shop'], inplace=True)
plot_frequency(plot_df, key, axis=axis, plot_style=plot_style)
def plot_group_by_x(ndf: DataFrame, x: str, y: str, plot_style: dict):
df = ndf[[x,y]].copy().dropna()
plt.rcParams['figure.figsize'] = (10,10)
min_x = round(df[x].min(), 2)
max_x = round(df[x].max(), 2)
group1_max = round((max_x-min_x)/3 + min_x, 2)
group2_max = round((max_x-min_x)*2/3 + min_x, 2)
group1_str = f'{min_x} to {group1_max}'
group2_str = f'{group1_max} to {group2_max}'
group3_str = f'{group2_max} to {max_x}'
is_between = lambda x_in, min_in, max_in: min_in < x_in < max_in
df[x] = df[x].apply(lambda row:\
group1_str if row < group1_max else\
group2_str if is_between(row, group1_max, group2_max) else\
group3_str)
plot_violin(df, {
'x': x,
'y': y,
'order': [group1_str, group2_str, group3_str]
}, plot_style)
def plot_group_by_x_percentage(ndf: DataFrame, x: str, y: str, plot_style: dict):
df = ndf.copy()
df[x] = df[x] * 100
plot_group_by_x(df, x, y, plot_style)
def plot_game_support(ndf: DataFrame):
df = ndf.copy()
df['windows_supported'] = df['windows_supported']==1
df['mac_supported'] = df['mac_supported']==1
df['linux_supported'] = df['linux_supported']==1
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle('Platforms supported')
fig.set_size_inches(pie_chart_size*3, pie_chart_size)
plt.rcParams['axes.titlesize']=default_plot_title_size
plot_pie_remove_duplicates(df, 'windows_supported', ax1, plot_style={
'title':"Is Windows Supported",
'num_of_charts':3,
'y_label':' ',
'is_percentage': True
})
plot_pie_remove_duplicates(df, 'mac_supported', ax2, plot_style={
'title':"Is Mac Supported",
'num_of_charts':3,
'y_label':' ',
'is_percentage': True
})
plot_pie_remove_duplicates(df, 'linux_supported', ax3, plot_style={
'title':"Is Linux Supported",
'num_of_charts':3,
'y_label':' ',
'is_percentage': True
})
def plot_game_price_history(ndf: DataFrame, steam_id: int, year: int = None, show_holidays = False):
df = ndf.copy().filter(['steamId', 'title', 'record_date', 'price_change'], axis=1)
df['price_change'] = 1 - df['price_change']
df = df[df['steamId'] == steam_id]
game_title = df['title'].iloc[0]
df.drop('title', inplace=True, axis=1)
# 2012-09-18 14:23:06
df['record_date'] = pd.to_datetime(df['record_date'], format='%Y-%m-%d %H:%M:%S')
df = df[df['record_date'].notna()]
df[ '7day_rolling_avg' ] = df['price_change'].rolling(7).mean()
df.drop('steamId', axis=1, inplace=True)
if year is not None:
from_date = f'{1}.{1}.{year}'
to_date = f'{1}.{1}.{year+1}'
from_date = pd.to_datetime(from_date)
to_date = pd.to_datetime(to_date)
df = df[df['record_date'].between(from_date,to_date)]
if df.empty:
return
df = df.set_index('record_date')
least_date: datetime.datetime = df.index.min()
most_date: datetime.datetime = df.index.max()
holidays = [('Christmas', 12, 25),
('Black Friday', 11, 25),
('Chinese Single\'s Day', 11, 11),
('Steam Summer Sale', 10, 1)]
df = df[~df.index.duplicated()]
plot_style = {
'title':f'Game price change over time, Game: {game_title}' + (f' at year {year}' if year else ''),
'y_label':"Price drop (1 is 100% off)",
'x_label':"Date"
}
plot_line(df=df, x='record_date', y='price_change', plot_style=plot_style)
plot_line(df=df, x='record_date', y='7day_rolling_avg', plot_style=plot_style)
if show_holidays:
for (holiday, month, day) in holidays:
for i in range (least_date.year, most_date.year + 1):
plt.axvline(x=datetime.datetime(i, month, day),color='m', linestyle="--")
plt.text(datetime.datetime(i, month, day),0,holiday,rotation=90)
plt.figure()
def get_n_games_row_count(df: DataFrame, n: int) -> list:
games = dict(df['steamId'].value_counts().sort_values(ascending=False)[:n])
games_list = list()
for (steam_id, count) in games.items():
games_list.append((steam_id, df[df['steamId']==steam_id].iloc[0]['title'], count))
return games_list
def plot_regular_price_per_genre(ndf: DataFrame):
df = ndf.copy().drop_duplicates(['steamId', 'shop'])
genre_colums = list(df.columns).filter(lambda x: str(x).startswith('genre.'))
genre_averages = dict()
for genre_colum in genre_colums:
genre_averages[genre_colum] = df[df[genre_colum] == True]['regular_price'].mean() # Average regular price for all rows matching the genre
print(df['genre.Strategy'].describe())
averages = list(genre_averages.items())
averages = sorted(averages, key=lambda k: k[1], reverse=True)[:10]
keys = list(map(lambda item: truncate(item[0].split('.')[1], 10), averages))
values = list(map(lambda item: item[1], averages))
plot_bar_chart(DataFrame(values, keys), sizeX=30, sizeY=5)
# Plot out top 5 publishers
# drop duplicate games by steamId
# group by publisher
# select row from publisher
# present as bar graph
def plot_price_per_10_random_publishers(ndf: DataFrame):
df = ndf[['steamId', 'publisher', 'regular_price']].copy()
df.drop_duplicates(subset=['steamId'], inplace=True)
df['publisher'] = df['publisher'].astype('category')
df_grouped_by_publisher = df.groupby(['publisher'])
grouped_by_publisher_list = list(df_grouped_by_publisher)
random.shuffle(grouped_by_publisher_list)
plot_df = DataFrame()
for idx, (publisher, dataframe) in enumerate(grouped_by_publisher_list):
if idx >= 10:
break
plot_df = plot_df.append([dataframe])
plot_df['publisher'].cat.remove_unused_categories(inplace=True)
plot_violin(plot_df, {
'x': 'publisher',
'y': 'regular_price'
}, {
'title':'Game Price / Publisher',
'x_label_rotation':90,
'xlabel':'Publisher',
'ylabel':'Regular Price'
})
cleaned_games_with_history
| steamId | title | history_link | required_age | is_free | controller_support | release_date | category.Single-player | category.Multi-player | category.PvP | ... | windows_supported | mac_supported | linux_supported | total_steam_recommendations | record_date | player_count | shop | regular_price | price_now | price_change | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-14 18:01:14 | 170.0 | Humble Store | 14.99 | 14.99 | 1.000000 |
| 1 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-13 08:10:28 | 96.0 | Fanatical | 14.99 | 11.01 | 0.734490 |
| 2 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-12 18:36:23 | 56.0 | Steam | 14.99 | 1.49 | 0.099400 |
| 3 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-08 03:25:58 | 36.0 | Nuuvem | 14.49 | 14.49 | 1.000000 |
| 4 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-07 18:11:54 | 81.0 | Humble Store | 14.99 | 1.49 | 0.099400 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 393450 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-31 17:16:26 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 393451 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-24 17:15:38 | NaN | Steam | 12.99 | 9.87 | 0.759815 |
| 393452 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-29 17:45:38 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 393453 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-22 17:45:25 | NaN | Steam | 12.99 | 11.69 | 0.899923 |
| 393454 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-06-03 19:52:31 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
387056 rows × 125 columns
plot_pie_remove_duplicates(
df=cleaned_games_with_history,
key='shop',
plot_style={
'title':'Most frequent shop',
'y_label':"Shop",
'is_percentage': True
}
)
plot_group_by_x_percentage(
cleaned_games_with_history,
'metacritic_score',
'total_steam_recommendations',
{
'x_label':'Metacritic Score',
'title':'Metacritic score per total steam recommendations',
'y_label':'Total steam recommendations'
}
)
plot_group_by_x(
cleaned_games_with_history,
'regular_price',
'num_of_dlc',
{
'title':'Regular Price / Number of dlcs available',
'x_label':'Regular price for a game in USD',
'y_label':'Number of dlcs',
}
)
We can see that the regular price is often lined to the metacritic score the game got.
plot_group_by_x_percentage(
cleaned_games_with_history,
'metacritic_score',
'regular_price',
{
'title':'Metacritic score / Regular Price',
'y_label':'Regular Price in USD',
'x_label':'Metacritic Score',
}
)
We can see that the most expensive publishers are big companys
plot_price_per_10_random_publishers(cleaned_games_with_history)
D:\Studies\Introduction-To-Data-Science\DataScienceFinalProject\.venv\lib\site-packages\pandas\core\arrays\categorical.py:2631: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs)
We can see that the genres Action, adventure and RPG average cost is higher
probably because the development takes far more time for these genres
plot_regular_price_per_genre(cleaned_games_with_history)
count 9466 unique 2 top False freq 7102 Name: genre.Strategy, dtype: object
plot_pie_remove_duplicates(cleaned_games_with_history, 'controller_support', plot_style={
'title':'Controller Support Pie Chart',
'y_label':"Controller Support",
'is_percentage': True
})
We can see that probably platforms that support Mac also support Linux, and all games support Windows
plot_game_support(cleaned_games_with_history)
We can see that most games start giving discounts of more than 20% after the first year of release,
then reach 50% on the start of the next year and on the third and fourth reach a max of 80% discount
games = get_n_games_row_count(cleaned_games_with_history, 5)
for (steam_id, game_title, _) in games:
plot_game_price_history(cleaned_games_with_history, steam_id)
<Figure size 2160x720 with 0 Axes>
We also see that on holidays the prices of games drop substantially,
but like in the last Plot after the second year, the time of year is not a factor anymore
games = get_n_games_row_count(cleaned_games_with_history, 1)
years = list(map(lambda i: i+2017, range(0, 4)))
for (steam_id, game_title, _) in tqdm(games):
for year in years:
plot_game_price_history(cleaned_games_with_history, steam_id, year=year, show_holidays=True)
0%| | 0/1 [00:00<?, ?it/s]
<Figure size 2160x720 with 0 Axes>
advanced_analysis_and_ml = load_df('games_after_cleaning_columns_with_only_one_unique_value.csv')
advanced_analysis_and_ml
| steamId | title | history_link | required_age | is_free | controller_support | release_date | category.Single-player | category.Multi-player | category.PvP | ... | windows_supported | mac_supported | linux_supported | total_steam_recommendations | record_date | player_count | shop | regular_price | price_now | price_change | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-14 18:01:14 | 170.0 | Humble Store | 14.99 | 14.99 | 1.000000 |
| 1 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-13 08:10:28 | 96.0 | Fanatical | 14.99 | 11.01 | 0.734490 |
| 2 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-12 18:36:23 | 56.0 | Steam | 14.99 | 1.49 | 0.099400 |
| 3 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-08 03:25:58 | 36.0 | Nuuvem | 14.49 | 14.49 | 1.000000 |
| 4 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-07 18:11:54 | 81.0 | Humble Store | 14.99 | 1.49 | 0.099400 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 387051 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-31 17:16:26 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 387052 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-24 17:15:38 | NaN | Steam | 12.99 | 9.87 | 0.759815 |
| 387053 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-29 17:45:38 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 387054 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-22 17:45:25 | NaN | Steam | 12.99 | 11.69 | 0.899923 |
| 387055 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-06-03 19:52:31 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
387056 rows × 125 columns
advanced_analysis_and_ml_steam_only = advanced_analysis_and_ml[advanced_analysis_and_ml['shop']=='Steam']
save_df(advanced_analysis_and_ml_steam_only, "advanced_analysis_and_ml_steam_only.csv")
advanced_analysis_and_ml_steam_only = load_df("advanced_analysis_and_ml_steam_only.csv")
advanced_analysis_and_ml_steam_only
| steamId | title | history_link | required_age | is_free | controller_support | release_date | category.Single-player | category.Multi-player | category.PvP | ... | windows_supported | mac_supported | linux_supported | total_steam_recommendations | record_date | player_count | shop | regular_price | price_now | price_change | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-12 18:36:23 | 56.0 | Steam | 14.99 | 1.49 | 0.099400 |
| 1 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2022-01-05 18:28:12 | 135.0 | Steam | 14.99 | 14.99 | 1.000000 |
| 2 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2021-12-22 21:59:23 | 59.0 | Steam | 14.99 | 1.49 | 0.099400 |
| 3 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2021-12-01 20:52:40 | 199.0 | Steam | 14.99 | 14.99 | 1.000000 |
| 4 | 359840 | Shift Happens | https://isthereanydeal.com/game/shifthappens/h... | 0 | False | full | 2017-02-22 | True | True | True | ... | 1 | 0 | 0 | 1037.0 | 2021-11-24 18:22:27 | 208.0 | Steam | 14.99 | 1.49 | 0.099400 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 39869 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-31 17:16:26 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 39870 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-10-24 17:15:38 | NaN | Steam | 12.99 | 9.87 | 0.759815 |
| 39871 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-29 17:45:38 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
| 39872 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-08-22 17:45:25 | NaN | Steam | 12.99 | 11.69 | 0.899923 |
| 39873 | 402880 | Age of Barbarian Extended Cut | https://isthereanydeal.com/game/ageofbarbarian... | 18 | False | full | 2016-06-03 | True | False | False | ... | 1 | 0 | 0 | 656.0 | 2016-06-03 19:52:31 | NaN | Steam | 12.99 | 12.99 | 1.000000 |
39874 rows × 125 columns
def save_model(t: str, name: str, model):
f = open(f'./results/models/{t}/{name}.pickle', "wb")
pickle.dump(model, f)
f.close()
def plot_correlation(ndf: DataFrame):
df = ndf.copy()
df = df.loc[:, ~df.columns.str.startswith('supported_language')]
df = df.loc[:, ~df.columns.str.startswith('publisher')]
df = df.loc[:, ~df.columns.str.startswith('developer')]
# df = df.loc[:, ~df.columns.str.startswith('category')]
# df.drop(['title', 'windows_supported', 'history_link', 'steamId', 'num_of_dlc', 'number_of_demos', 'num_of_game_videos', 'num_of_packages_game_is_in'], axis=1, inplace=True)
# TEMP
df['extra_content'] = df['num_of_dlc'] + df['number_of_demos'] + df['num_of_game_videos'] + df['num_of_packages_game_is_in']
df['is_multiplatform'] = df['linux_supported'] | df['mac_supported']
df.drop(['title', 'price_now', 'windows_supported', 'linux_supported', 'mac_supported', 'history_link', 'fullgame', 'steamId', 'player_count', 'num_of_dlc', 'number_of_demos', 'num_of_game_videos', 'num_of_packages_game_is_in'], axis=1, inplace=True)
df['controller_support'] = df['controller_support'].apply(lambda row: 1 if row == 'full' else 0)
df['required_age'] = df['required_age'].astype("category")
df = df.loc[:, ~df.columns.str.startswith('category')]
df['is_action'] = df['genre.Action'] + df['genre.action'] + df['genre.Action games']
df['is_adventure'] = df['genre.Adventure'] + df['genre.Adventure games']
df['is_rpg'] = df['genre.RPG']
df['is_race'] = df['genre.Race']
df['is_casual'] = df['genre.Casual'] + df['genre.Casual games']
df['is_strategy'] = df['genre.Strategy']
df['is_simulation'] = df['genre.Simulation'] + df['genre.Simulators']
df['is_violent'] = df['genre.Violent'] + df['genre.Gore']
df['is_multiplayer'] = df['genre.Multiplayer games']
df = df.loc[:, ~df.columns.str.startswith('genre.')]
# END TEMP
corr = df.corr().abs()
# upper = corr.where(np.triu(np.ones(corr.shape), k=1).astype(bool))
# to_drop = [column for column in upper.columns if any(upper[column] > 0.2)] # dont remove all
# df.drop(to_drop, axis=1, inplace=True)
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(0, 200, s=160, l=55, n=9, as_cmap=True)
sns.heatmap(corr, cmap=cmap, vmax=.5, vmin=0, center=0, mask=mask, square=True, linewidths=1)
plt.title('Correlation of features')
plot_correlation(advanced_analysis_and_ml_steam_only)
C:\Users\elad1\AppData\Local\Temp/ipykernel_20380/1347654822.py:39: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations mask = np.zeros_like(corr, dtype=np.bool)
def load_dataframe(ndf: DataFrame):
df = ndf.copy()
df['extra_content'] = df['num_of_dlc'] + df['number_of_demos'] + df['num_of_game_videos'] + df['num_of_packages_game_is_in']
df.drop(['title', 'price_now', 'windows_supported', 'history_link', 'fullgame', 'steamId', 'player_count', 'num_of_dlc', 'number_of_demos', 'num_of_game_videos', 'num_of_packages_game_is_in'], axis=1, inplace=True)
df['controller_support'] = df['controller_support'].apply(lambda row: 1 if row == 'full' else 0)
df['required_age'] = df['required_age'].astype("category")
df = df.loc[:, ~df.columns.str.startswith('category')]
df['is_action'] = df['genre.Action'] + df['genre.action'] + df['genre.Action games']
df['is_adventure'] = df['genre.Adventure'] + df['genre.Adventure games']
df['is_rpg'] = df['genre.RPG']
df['is_race'] = df['genre.Race']
df['is_casual'] = df['genre.Casual'] + df['genre.Casual games']
df['is_strategy'] = df['genre.Strategy']
df['is_simulation'] = df['genre.Simulation'] + df['genre.Simulators']
df['is_violent'] = df['genre.Violent'] + df['genre.Gore']
df = df.loc[:, ~df.columns.str.startswith('genre.')]
avg_metacritic_score = df['metacritic_score'].mean()
df['metacritic_score'].fillna(avg_metacritic_score, inplace=True)
df['is_good_meta_score'] = df['metacritic_score'] > 6
df['is_bad_meta_score'] = df['metacritic_score'] < 6
df['total_steam_recommendations'].fillna(0, inplace=True)
categorical_data = df.select_dtypes(['category'])
for category_col in categorical_data:
df.append(pd.get_dummies(data=df[category_col], drop_first=True))
df.drop(category_col, axis=1, inplace=True)
df['release_date'] = pd.to_datetime(df['release_date'])
df['release_date'] = df['release_date'].values.astype(np.int64) // 10 ** 9
if df['metacritic_score'].iloc[0] is None or np.isnan(df['metacritic_score'].iloc[0]):
df.drop(['metacritic_score'], axis=1, inplace=True)
df['record_date'] = pd.to_datetime(df['record_date'])
df['record_date'] = df['record_date'].values.astype(np.int64) // 10 ** 9
df['developer'] = df['developer'].astype('category').cat.codes
df['publisher'] = df['publisher'].astype('category').cat.codes
df['shop'] = df['shop'].astype('category').cat.codes
df.set_index(pd.DatetimeIndex(df['record_date']*10**9), inplace=True, drop=True)
return df
def split_propreties_test(ndf: DataFrame, target_column):
df = load_dataframe(ndf)
Y = df.pop(target_column)
return df, Y
def train_linear_regression_model(X_train, y_train):
model = linear_model.LinearRegression()
return model.fit(X_train, y_train)
def train_random_forest_model(X_train, y_train):
clf = RandomForestRegressor(n_estimators=150)
return clf.fit(X_train,y_train)
def train_xgboost(X_train, y_train):
clf = xgb.XGBRegressor(verbosity=0)
return clf.fit(X_train,y_train)
def predict_evaluate_performance(model, X_test, y_test, **kargs):
predicted = model.predict(X_test)
return {'predicted': predicted, 'score': r2_score(y_test, predicted)}
def create_model(ndf, train_model_fn) -> dict:
X, y = split_propreties_test(ndf, 'price_change')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, shuffle=False)
return {'model': train_model_fn(X_train=X_train, y_train=y_train), 'X_test': X_test, 'y_test': y_test}
def create_train_model(d: dict, train_model_fn):
removed_games = []
print("Creating model for entire game df")
d['as_is'] = create_model(advanced_analysis_and_ml, train_model_fn)
print("Creating model for each game")
for game in tqdm(advanced_analysis_and_ml.steamId.unique(), total=len(advanced_analysis_and_ml.steamId.unique())):
game_df: DataFrame = advanced_analysis_and_ml[advanced_analysis_and_ml.steamId==game]
if len(game_df.index) <= 5:
removed_games.append(game)
continue
d[game] = create_model(game_df, train_model_fn)
return d, removed_games
def scatter_plot_models(price_change_predictions_dict: dict):
name_and_performance = DataFrame(map(lambda x: (x[0], x[1]['performance']['score']), price_change_predictions_dict.items()))
failed_models = name_and_performance[name_and_performance[1]<=0]
good_models = name_and_performance[name_and_performance[1]>0]
print(len(good_models.index), )
plot = sns.stripplot(x=0, y=1, data=good_models)
set_plot_styling(plot, {
"title": "Performance ignoring failed models",
"sq_size": 12,
"y_label": "R2 Score",
"x_label": 'Model',
"remove_x_labels": True,
'label_size': 20
})
print("Failed models:", failed_models)
def assert_models_performance(m):
m['as_is']['performance'] = predict_evaluate_performance(**m['as_is'])
for game in tqdm(advanced_analysis_and_ml.steamId.unique()):
if game in removed_games:
continue
m[game]['performance'] = predict_evaluate_performance(**m[game])
def save_models(t: str, all_models_dict):
for key, val in all_models_dict[t].items():
model = val['model']
save_model(t, key, model)
models = dict()
models['linear_regression'] = dict()
models['linear_regression'], removed_games = create_train_model(models['linear_regression'], train_linear_regression_model)
Creating model for entire game df Creating model for each game
0%| | 0/707 [00:00<?, ?it/s]
assert_models_performance(models['linear_regression'])
0%| | 0/707 [00:00<?, ?it/s]
save_models('linear_regression', models)
scatter_plot_models(models['linear_regression'])
134 Failed models: 0 1 1 359840 -39.847836 2 1174180 -0.009497 3 1091500 -1.549635 4 442070 -0.026645 5 21660 -0.541458 .. ... ... 688 699170 -1.455733 689 1004490 -1.419224 690 305380 -0.035983 692 1421760 -0.044482 693 402880 -0.073749 [560 rows x 2 columns]
models['random_forest'] = dict()
models['random_forest'], removed_games = create_train_model(models['random_forest'], train_random_forest_model)
Creating model for entire game df Creating model for each game
0%| | 0/707 [00:00<?, ?it/s]
save_models('random_forest', models)
assert_models_performance(models['random_forest'])
scatter_plot_models(models['random_forest'])
0%| | 0/707 [00:00<?, ?it/s]
18 Failed models: 0 1 1 359840 -0.293786 2 1174180 -0.236623 3 1091500 -2.829372 4 442070 -0.178463 6 361280 -0.051556 .. ... ... 689 1004490 -1.415007 690 305380 -0.470491 691 696530 -0.118253 692 1421760 -1.578803 693 402880 -0.003535 [676 rows x 2 columns]
models['xgboost'] = dict()
models['xgboost'], removed_games = create_train_model(models['xgboost'], train_xgboost)
Creating model for entire game df Creating model for each game
0%| | 0/707 [00:00<?, ?it/s]
save_models('xgboost', models)
assert_models_performance(models['xgboost'])
scatter_plot_models(models['xgboost'])
0%| | 0/707 [00:00<?, ?it/s]
11 Failed models: 0 1 1 359840 -0.494933 2 1174180 -0.870520 3 1091500 -6.670066 4 442070 -0.094166 5 21660 -0.177135 .. ... ... 689 1004490 -2.295307 690 305380 -2.332843 691 696530 -0.460151 692 1421760 -4.396223 693 402880 -0.768322 [683 rows x 2 columns]
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import tensorflowjs as tfjs
def create_lstm_model(ndf):
df = load_dataframe(ndf)
train = df[df.index < '2020-01-01 00:00:01']
test = df[df.index >= '2020-01-01 00:00:01']
# scale data
scaler = MinMaxScaler()
scaler.fit(train)
scaled_train = scaler.transform(train)
scaled_test = scaler.transform(test)
# define generator
n_input = 20
n_features = len(df.columns)
generator = TimeseriesGenerator(scaled_train, scaled_train, length=n_input)
# define model
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(n_input, n_features)))
model.add(LSTM(90, return_sequences=True))
model.add(LSTM(80))
model.add(Dense(1, activation='softmax'))
model.compile(optimizer='adam', loss='mse')
# fit model
return model, model.fit(generator,epochs=10)
lstm_model, lstm_fit_history = create_lstm_model(advanced_analysis_and_ml)
Epoch 1/10 1181/1181 [==============================] - 59s 46ms/step - loss: 0.6504 Epoch 2/10 1181/1181 [==============================] - 58s 50ms/step - loss: 0.6504 Epoch 3/10 1181/1181 [==============================] - 61s 51ms/step - loss: 0.6504 Epoch 4/10 1181/1181 [==============================] - 61s 51ms/step - loss: 0.6504 Epoch 5/10 1181/1181 [==============================] - 64s 54ms/step - loss: 0.6504 Epoch 6/10 1181/1181 [==============================] - 62s 52ms/step - loss: 0.6504 Epoch 7/10 1181/1181 [==============================] - 61s 52ms/step - loss: 0.6504 Epoch 8/10 1181/1181 [==============================] - 62s 52ms/step - loss: 0.6504 Epoch 9/10 1181/1181 [==============================] - 62s 53ms/step - loss: 0.6504 Epoch 10/10 1181/1181 [==============================] - 65s 55ms/step - loss: 0.6504
def plot_loss(history):
loss_per_epoch = history.history['loss']
plt.plot(range(len(loss_per_epoch)),loss_per_epoch)
plot_loss(lstm_fit_history)
tfjs.converters.save_keras_model(lstm_model, "./results/models/deep-learning-model")